- What is R?
- Brief Intro To RStudio & R Concepts
- R Objects
- The R Workspace
- R Packages
- Getting Data Into R
- Basic Plotting
2 + 2 5 * 4 2^3
2 + 3 * 4/(5 + 3) * 15/2^2 + 3 * 4^2
<-" is known as the assignment operator in R3 to the variable name x:x <- 3
> x <- 3 > x [1] 3
> y Error: object 'y' not found
You can name your variables anything you want, but there are a few rules:
v.one and v_one are valid names but v one is not (because it includes a space).More information on general R programming style can be found here.
Objects in R can be a number of different types. Next, we'll discuss the three types you are most likely to encounter.
'j', 'hello', 'treatment A'1, 550, 3.14TRUE and FALSE and are often used to control programming flowdata.frame() function> data.frame(group = rep(c('trt', 'ctrl'), each=4),
response = rnorm(8))
group response
1 trt 0.01231700
2 trt 0.55028627
3 trt -0.24411263
4 trt 0.35996580
5 trt 0.89960231
6 ctrl -0.76686861
7 ctrl 0.03573818
8 ctrl 1.69159332
mean function is used to find the average of a vector of numbers:> my_data <- c(10, 22, 44, 55, 14, 66) > mean(my_data) [1] 35.16667
The R workspace can be thought of as a container holding all of the objects you've created duing your R session. You can print a list of all of the objects in your current workspace using the ls() function. If we start a new R session, our workspace will be empty:
> ls() character(0)
And we'll be able to see some objects if we add them:
> x <- 20 > y <- 30 > z <- x + y > df <- data.frame(nums = 1:10, b = letters[1:10]) > ls() [1] "df" "x" "y" "z"
save.image: saves a snapshot of entire workspace to a file
save.image('my-work.RData')save: saves a snapshot of a few specified objects to a file.
df, the data.frame we created in the last section:save(df, file = 'my-dataframe.RData')load: loads your saved *.RData files back in to R
df, the data.frame we just saved:load('my-dataframe.RData')rm() function to remove objects andrm() lets you remove objects from your R session when they aren't needed> ls() # start in an empty workspace character(0) > y <- 1 > z <- 1 > ls() # can see the two objects we created [1] "y" "z" > rm(y) # remove y > ls() [1] "z"
You can remove all objects in your workspace by using ls() to generate a vector of all the objects that have been created, and passing that to the rm() function:
rm(list = ls())
You can also use a button in RStudio's Environment panel to remove all of the objects in your workspace. RStudio will prompt you asking if you are sure you want to go through with deleting all objects, choosing Yes will permanently delete all objects in the workspace.
Most generic R packages are hosted on the Comprehensive R Archive Network CRAN. To install one of these packages, you would use install.packages("packagename"). You only need to install a package once, then load it each time using library(packagename). Here's how one would install and load the ggplot2 package.
# Install only once.
install.packages("ggplot2")
# Load the package every time you want to use it.
library(ggplot2)
Now that we're feeling a bit more comfortable with the R environment, we'll explore how we can import our own experimental data into R.
Before going any further, please download the zip file below and extract its contents somewhere on your computer.
read.delim function to import our .tsv data frameread.delim is a special alternative to R's more general read.table function (?read.table for details)read.csv is another special alternative to read.tableread.csv & read.delim have> gene.exprs.long[1:5, ] # print the first 5 rows
ID Group Gene Expression
1 Sample 1 Group 1 Gene1 9.695228
2 Sample 2 Group 1 Gene1 8.087463
3 Sample 3 Group 1 Gene1 9.885696
4 Sample 4 Group 1 Gene1 7.832890
5 Sample 5 Group 1 Gene1 10.239599
> gene.exprs.long[1:5, 1:2] # print the first 5 rows and columns 1 & 2
ID Group
1 Sample 1 Group 1
2 Sample 2 Group 1
3 Sample 3 Group 1
4 Sample 4 Group 1
5 Sample 5 Group 1
subset(dataset, logical conditions)> subset(gene.exprs.long, ID == 'Sample 1') # All Sample 1 measures
ID Group Gene Expression
1 Sample 1 Group 1 Gene1 9.695228
51 Sample 1 Group 1 Gene2 4.694323
101 Sample 1 Group 1 Gene3 3.733354
151 Sample 1 Group 1 Gene4 4.874305
> subset(gene.exprs.long, Expression > 10) # gene expression > 10
ID Group Gene Expression
5 Sample 5 Group 1 Gene1 10.23960
6 Sample 6 Group 1 Gene1 11.01402
10 Sample 10 Group 1 Gene1 10.10460
13 Sample 13 Group 2 Gene1 10.21796
16 Sample 16 Group 2 Gene1 10.03342
46 Sample 46 Group 5 Gene1 10.11114
> gene1 <- subset(gene.exprs.long, Gene == 'Gene1') # only data for gene 1
> summary(aov(Expression ~ Group, data = gene1)) # run an ANOVA on it
Df Sum Sq Mean Sq F value Pr(>F)
Group 4 13.08 3.271 4.879 0.00236 **
Residuals 45 30.17 0.670
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
There are three main graphing frameworks available in R for creating high quality plots:
ggplot2!ggplot2 allows intro R users to create high quality data visualizations with little effort using an intuitive plotting syntax:
ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics (taken from ggplot2 site).
qplot() is the easiest way to get started with ggplot2# Install only once.
install.packages("ggplot2")
# Load the package every time you want to use it.
library(ggplot2)
qplot call:qplot(x = Gene, y = Expression, data = gene.exprs.long)
qplot guesses what the best graphical display would be given your datageom optionqplot(x = Gene, y = Expression, data = gene.exprs.long, geom = 'boxplot')
color option adds color!qplot(x = Gene, y = Expression, data = gene.exprs.long, color = Gene)
fill for some geomsqplot(x = Gene, y = Expression, data = gene.exprs.long, geom = 'boxplot',
fill = Gene)
qplot(x = Group, y = Expression, data = gene.exprs.long, fill = Group,
facets = ~ Gene, geom = 'boxplot')
qplot(x = Group, y = Expression, data = gene.exprs.long, color = Group,
facets = ~Gene, geom = "jitter") + theme_bw() +
scale_color_brewer(type = 'qual', palette = "Dark2") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
.txt or .tsv.ID Group Gene1 Gene2 Gene3 Gene4 Sample 1 Group 1 9.695 4.694 3.733 4.874 Sample 2 Group 1 8.087 3.276 2.220 3.095 Sample 3 Group 1 9.885 6.297 5.842 8.233 Sample 4 Group 1 7.832 -1.286 2.594 -1.089 Sample 5 Group 1 10.239 4.474 3.300 3.377
File -> Save As -> Tab-Delimited Text
.csv."ID","Group","Gene1","Gene2","Gene3","Gene4" "Sample 1","Group 1",9.695,4.694,3.733,4.874 "Sample 2","Group 1",8.087,3.276,2.220,3.095 "Sample 3","Group 1",9.885,6.297,5.842,8.233 "Sample 4","Group 1",7.832,-1.286,2.594,-1.089 "Sample 5","Group 1",10.239,4.474,3.300,3.377
File -> Save As -> Comma Separated Values